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Abstract. We examine the relation between topological entropy, invertibility, and 
0^ ' prediction in topological dynamics. We show that topological determinism in the 

sense of Kamirisky Siemaszko and Szymariski imposes no restriction on invariant 
CN ■ measures except zero entropy. Also, we develop a new method for relating topo- 

logical determinism and zero entropy, and apply it to obtain a multidimensional 
, analog of this theory. We examine prediction in symbolic dynamics and show that 

| while the condition that each past admit a unique future never occurs for interesting 

systems, the condition that each past have a bounded number of future imposes no 
restriction on invariant measures except zero entropy. Finally, we give a negative 
Q ■ answer to a question of Eli Glasner by constructing a zero-entropy system with a 

■ globally supported ergodic measure in which every point has multiple preimages. 

•i— » 



(N ■ 1. Introduction 

There are several ways to define "determinism" of a dynamical system, all of 



o 



which express in various ways the idea that the past determines the future (and 
vice versa). In ergodic theory, a measure-preserving map T of a probability space 
(X, B, n) is deterministic if for every measurable / : X — > K (or equivalently every 
finite-valued /) the values f(Tx),f(T 2 x),... determine f(x) with probability one, 
i.e. / G a(Tf,T 2 f, . . .) where cr(jF) is the cr-algebra generated by T . Another equiv- 
^ ■ alent condition is that every factor (Y, C, u, S) of (Y, B, /i, T) is essentially invertible, 

i.e there is an invariant set Y C Y of full measure such that S\y is invertible. Yet 
another equivalent condition which is widely used is that entropy vanish: h(T, jj) = 0. 

In this work we examine the relations between prediction, invertibility and entropy 
in the category of topological dynamics, where by a topological dynamical system 
(Y, T) we mean a continuous onto map T : Y — > X of compact metric space. One 
can find analogs of these three conditions, but the relations between them are more 
complex. We present here several results that underscore the independence of these 
notions, complementing some of the recent works on the subject, e.g. m [3]. 

1.1. Topological predictability. Kamihski, Siemaszko and Szymanski introduced 
in [5] an interesting and natural notion of predictability and for topological systems. 
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A system (X, T) is topologically 




or TP, if for every continuous function 



/ G C(X) we have / G (1, Tf, T 2 f, . . .), where (J 7 ) C C(X) denotes the closed 
algebra generated by a family JF C C(X). Kamihski et. al. showed that (X,T) is 
topologically predictable if and only if every factor of (X, T) is invertible, where a 
factor is a system (F, S) and a continuous onto map n : X — > F such that 7rT = SV. 

One would like to understand what other dynamical implications topological pre- 
dictability has. In [5] it was shown that a TP systems have zero topological entropy, 
but that the converse to this is false. This follows easily from the fact that every to- 
tally disconnected TP system is equicontinuous, whereas every zero entropy measure 
can be realized as an invariant measure on a totally disconnected (and hence not TP) 
system. 

Nonetheless, although "not TP" seems to say little, TP is a rather strong condition, 
and one might suppose it to impose restrictions on the measurable dynamics. This 
is supported by the fact that only two classes of TP systems were known beyond the 
equicontinuous case: the distal systems and the pointwise rigid systems (that these 
are TP follows from [5]). 

Our first result is that TP imposes no restrictions on invariant measures except 
zero entropy: 

Theorem 1.1. For every zero- entropy, ergodic measure-preserving system (X, B, \i, T) 
there is a topological system (F, S) and an invariant measure v on Y such that 
(F, v, S) = (X, B, fi, T) and for every y', y" in Y , the point (y', y") is forward re- 
current for S x S. In particular, (F, S) is TP. 

This construction is related to the construction in [9] which as a by-product pro- 
duces, for any zero entropy measure preserving system, a topological model in which 
every pair is two-sided recurrent. However, this is a far weaker statement than for- 
ward recurrence; in fact, the realization in [9] is on a totally disconnected space, 
and one cannot hope that such a system will be TP (for then the action would be 
equicontinuous, and the invariant measures would have pure point spectrum). 

As a consequence of this one gets a new functional characterization of the vanishing 
of entropy in a measure preserving systems: 

Corollary 1.2. A measure preserving system (X, £>, fi,T) has entropy if and only 
if there exists a separable sub algebra A C L°°(/i) which separates points and such 



Kaminski et. al. use the term topological determinism, but this seems to us confusing in the present 
context. 



that f G (1,T/,T 2 



/,...) for every f G A. 
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Next we discuss the notion of TP to Z d actions. Such an action {T u } ueZ d of Z d by 
homeomorphisms on X is topologically predictable (TP) if / G (l,T u f : u < 0) for 
every / G C(X); here < is the lexicographical ordering on 1 d . One may ask whether 
this notion is independent of the generators (the lexicographic ordering certainly is 
not). It is not; even in dimension 1, the property TP depends on the generator, i.e. 
TP for T does not imply it for T _1 . Thus TP is a property of a group action and a 
given set of generators. 

The proof in [5] that TP implies entropy for a single transformation used the 
existence of asymptotic pairs in positive entropy systems. In section 13.31 we give a 
new and direct argument for this implication, which is somewhat more transparent. 
Furthermore, our proof can be used to generalize the result to actions of Z d . 

Theorem 1.3. For a Z d -action, TP implies zero topological entropy. 

There is a rather complete theory of entropy, developed by Ornstein and Weiss, for 
actions of amenable groups on probability spaces. One feature which is absent from 
the general theory (and which we utilized for Z and Z d actions) is a good notion of 
the "past" of an action, and the ability to represent the entropy of a partition as a 
conditional entropy of the partition with respect to the "past". However by analogy 
to the abelian case the following question is natural: 

Problem 1.4. Suppose an infinite discrete amenable group G acts by homeomor- 
phisms on X. Let S C G be a subsemigroup not containing the unit of G, and 
such that S U S^ 1 generates G. Suppose that for every / G C(X) we have / G 
(1, sf : seS). Does this imply that h(X, G) = 0? 

1.2. Prediction for symbolic systems. Let S be a finite set of symbols and con- 
sider the space S z of bi-infinite sequences over E. Denote by a : S z — > S z the shift 
map. A symbolic system is a closed, a-invariant subset of S z . 

Let X C S z be a subshift and let x~ G S _N ; for x G S z we also write x~ — 
A finite or infinite sequence x + G Ui^n^ooS™ is an admissible extension of x~ (with 
respect to X) if the concatenation x~x + is in X. If h(X) = then h(fi) = for 
every invariant measure \i on X, and so there is a set of points X C X having full 
measure with respect to every invariant measure, such that x~ has a unique extension 
for every x G X . A natural question is whether this can occur for every x G X. 
The answer is no: in fact, it is well known that the only subshifts for which every 
admissible past x~ admits a unique continuation are finite unions of periodic orbits 
(we give a proof in lemma |4~TT) . 
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However there do exist subshifts where each x~ G allows only finitely many 
extensions; the best known are probably the Sturmian sequences. Such subshifts 
must have zero entropy. It turns out that such systems are not uncommon, and that 
entropy is again the only restriction to the dynamics of their invariant measures: 

Theorem 1.5. Every ergodic measure-preserving system with entropy zero can be 
realized as an invariant Borel measure on a uniquely ergodic subshift X C {0, 1} Z 
which has the property that every x~ G {0, 1}~ N has at most two infinite extensions 
ofx~. 

This may be viewed as a sharpening of the Jewett-Krieger generator theorem, which 
states that every measure-preserving system with finite entropy h can be realized as 
the unique invariant measure on a uniquely ergodic subshift on k symbols, provided 
log A; > h. In zero entropy, one cannot use less than 2 symbols. This theorem says 
that one can do the next best thing. 

1.3. Non-invertibility and entropy. Consider a symbolic system X C S N (note 
that we now have a one-sided shift), and an invariant probability measure fi on X. 
Recall that, since the partition of X according to the first symbol generates the cr- 
algebra, the entropy h(^) is the average of the entropy of the conditional measures, 
given x, induced on the preimage set cr _1 (x). Thus if h(/i) > then with positive 
probability a~ 1 (x) is not concentrated on a single point, and consequently there is a 
large set of points in X with multiple preimages. It is therefore natural to ask, what 
"degree" of non-invertibility is necessary to guarantee positive entropy? 

One plausible condition is that each point have multiple preimages; we call such 
a system totally non-invertible. Indeed, for subshifts this is enough to imply posi- 
tive entropy, because for symbolic systems total non-invertibility implies a stronger 
condition: the preimage of every point has diameter > 5 for some positive 5. If 
this condition is satisfied we say that the system has no small preimages. An easy 
argument shows that a map with no small preimages has entropy at least log 2 (see 
proposition 15.11 below) . 

Total noninvertibility does not guarantee positive entropy in general, though in 
some special cases it does, e.g. maps of the interval [I]. One would like to find addi- 
tional hypotheses which, together with total non-invertibility, imply positive entropy. 
One candidate is the presence of a globally supported ergodic measure. In a totally 
noninvertible system there is always an open set of points whose preimages have di- 
ameter which is bounded below by some positive constant, and by ergodicity almost 
every orbit will spend a positive fraction of its time in this set. One would hope to 
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use this to construct many well-separated orbits. Eli Glasner has raised the question 
of whether this hypothesis indeed implies positive entropy. We show below that this 
is false. 

Example 1.6. There exist zero entropy totally non-invertible system with a globally 
supported ergodic measure. 

For an integer k > we say that a system (X, T) is at least k-to-one if the preimage 
set of every point is of size at least k. J. Bobok has shown that if a map of the circle 
or the interval is fc-to-one then h(T) > log /c, and has asked if this holds in general, at 
least under the assumption that there are no small preimages. We can give a negative 
answer to this: 

Example 1.7. There exists an infinite-to-one system (X, T) with no small preimages 
which supports a global ergodic invariant measure but h(X,T) = log 2. 

There seems to be no obstruction in our examples to making the measures weakly 
mixing, and possibly strong mixing, but we do not pursue this here. 

The question remains if this can happen for a continuous map on a manifold. For 
smooth maps it cannot; see [2]. 

Acknowledgement. This work was done during the author's graduate studies. I would 
like to thank Benjamin Weiss for his constant encouragement and for raising some of 
the questions addressed here. 

2. Notation 

We will use freely standard facts about topological dynamics and entropy which 
can be found e.g. in [8]. This section contains some further notation for dealing with 
sequence spaces, which will be useful later. 

Let E be a set. Write £* for the set of all finite words over S. The i-th letter of 
a word a is denoted by a(i). If a = a(l)a(2) . . . a{k) then k is the length of a and is 
denoted by £(a). We denote concatenation the of words a, b G X* by ab. 

Similarly, we define the spaces of one-sided sequences S N , S _N (we use the conven- 
tion N = {1,2,3, . . .}) and of two-sided sequences, S z . If a topology is given on £ 
these sequence spaces carry the product topology; for finite £ we take the discrete 
topology for E. We denote by a the shift map on both these spaces which is defined 
by the formula (o~(x))(i) = x{i + 1); this map restricted to S N and S z is continuous 
and onto, and is a homeomorphism in the two-sided case. In the one sided case the 



6 



MICHAEL HOCHMAN 



preimage set of every point is isomorphic to S. We also define the shift on S* in the 
obvious way, by 

a{x(l)x{2) . . . x{k)) = ar(2)x(3) . . . x(k) 

(note that a n (ab) = (a n a)b if n < 1(a) but is equal to a n ~ e ^(b) if 1(a) < n < 
i(a) + £(b). Otherwise it is the empty word). When concatenating infinite sequences, 
we adopt the convention that if x G S _N and y G S N then xy G S z is the sequence 
obtained by shifting y one symbol to the left and concatenating (note that neither x 
nor y is defined at index 0). 

For a word x (finite or infinite), if x = ab then a is called a front segment of x (if 
£(a) = k then a is a front fc-segment of x), and b a back segment of x. For a, b G E* 
we say that a is a subword of b at index i if i < £(b) — £(a) + 1 and a(j) = b{i + j) 
for j = 1, ... , £(a). The index i is called the alignment of a in 6. If such an i exists 
we say that a appears in b, or that it is a subword of b. 

We denote by the segment of consecutive integers H Z, and denote by 
x\[i-j] = x(i)x(i + 1) . . -x(j) the subword of x determined by provided x is long 
enough for this to make sense. 

All measures are assumed to be Borel probability measures. 

3. Topological predictability 

3.1. Entropy, recurrence and TP. A topologically predictable system has zero 
topological entropy, and therefore by the variational principle every invariant measure 
on it has entropy zero. In this section we show that this is the only restriction on 
invariant measures. The construction is rather technical; we emphasize that this 
section is not used in the sequel. 

A point x in a dynamical system (X, T) is forward recurrent if T n ^x — > x for some 
sequence of times n(k) — > oo. If every point in a system is forward recurrent then 
every closed subset A C X which is forward invariant - i.e. TA C A - is invariant, 
i.e. T~ l A = TA = A. 

In order to construct a TP system supporting a given measure we shall use the 
following fact: is (X, T) has the property that every point in X x X is forward 
recurrent, then it is TP. Indeed, this implies that every forward invariant, closed 
equivalence R C X x X is invariant also invariant under T^ 1 ; this is equivalent to 
the property that every factor is invertible, so (X, T) is topologically predictable [5]. 
B. Weiss has shown in [9] that every invariant measure can be realized on a symbolic 
system for which each pair is two-sided recurrent (i.e. (T n ^x, T n ^ k 'y) — » (x, y) for 
some \n(k)\ — > oo). This does not imply the result we want, but our methods will 
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be related to his. We remark that our construction cannot be symbolic since since 
infinite symbolic systems always contain forward-asymptotic pairs. We shall instead 
construct a connected subshift of [0, 1] N . 



3.2. Realization of measures on TP systems. We begin the proof of theorem 
11.11 Recall that (X, B, fi, T) is a measure-preserving system with zero entropy, and 
we wish to construct a space Y and homeomorphism S : Y — > Y for which every pair 
is forward recurrent and which supports a measure isomorphic to (X, Bfi,T). 

For the construction we may assume by [6] that T is a minimal, strictly ergodic 
homeomorphism of a totally disconnected metric space X and that there exists a 
clopen generator for T. We may also assume that (X, T) is topologically mixing. 

Given a measurable function / : X — > [0, 1] let : X — » [0, l] m denote the 
function x i-> (f(x), f(Tx), . . . , /(T m_1 a;)), and similarly /(°°) : X -> [0, 1] N the map 
x i — ► (/(#)) f(Tx), f(T 2 x), . . .). We use the notation || aj| ^ = sup |aj| for a G M m or 
a G R N . 

For integers m, r we say that / is (m, r)-good if there is a subset Xf^ r C X of 
full measure such that for every x',x" G Xf t m )r there is an integer < k < r (which 
may depend on x', x") satisfying 



^(m) {xf) _ f (m) {T k xl) ^ < 

-/H(rV)|| < 



1 

m 
1 

m 



Suppose that / is (m, r(m))-good for some sequence r{m). Setting X = n^ =1 X/ jr7ljr .( m ), 
the above holds for every x',x" G X and all m G N. If we set v = f^fji and 
Y = supp v C [0, 1] N , it follows that each pair of points in Y is forward recurrent for 
the shift a. Also, v is shift invariant on (Y, a) and is a factor map from X to 
V; if the partition induced by / on X generates for T then this is an isomorphism. 
Thus the theorem will follow once we construct a function / as above. 

We construct / by approximation. More specifically, we define a sequence of func- 
tions f n : X —> [0,1] and integers r(n) such that /„ is (m, r(m)) good for each 
m < n. The sequence f n will converge almost surely to a function /, which is clearly 
(m, r(m)) good for m G N. Also, each /„ will generate for T and we will guarantee 
that / generates by controlling the speed of convergence of f n to /. The / n 's will be 
continuous and each will take on only finitely many values, so we may identify them 
with finite partitions P n of X into clopen sets, where f n (x) = i if and only if x is in 
the i-th partition element of P n (i may take on non-integer values). 
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The construction proceeds by induction. Our induction hypothesis will be that 
we are given a function f n arising from a finite clopen generating partition P n , and 
integers r(l), . . . , r(n) so that /„ is (m, r(m))-good for m = 1, . . . , n. For any e, we 
will show how to define f n+ i and r(n + 1) satisfying the same condition with n + 1 
in place of n, and such that 

fi(x E X : / n (x) ^ f n +i{x)) < e 

By choosing e = e{n) to decrease rapidly enough this last condition guarantees that 
fn~*f almost surely, and that / generates for T. 

Suppose then that we are given /„, r(l), . . . , r(n) and e > 0. First, note that 
the properties of these objects are completely determined by the itineraries of length 
r(n) + n associated under f n to points in X, i.e. by the image 

of The 

following lemma, whose proof we omit, says that the desired properties of the blocks 
continue to hold if we modify itineraries in a sufficiently slow way: 

Lemma 3.1. For f n , P n , r(l), . . . , r{n) as above, there is a number < p < ^-j- with 
the following property. Suppose y',y" G [0, l] r ( n )+ n are blocks appearing in fn (X) 
and a', a" e [0, l] r ( n )+ n have the property that \a'(i)—a'(i + l)\ < p and \a"(i) — a"(i + 
1)| < p for all I <i < r(n) + n - 1. Define z', z" G [0, l] r W+n 5^ z '{i) = a '{i) . y '(i) 
and z"{i) = a"{i) ■ y"{i). Then there exists < k < r(m) with \z'(i) — z'(i + k)\ < — 
and \z"(i)- z"{i + k)\ < i fori = 1,2,..., n. 

Let Y C [0, 1] N be the symbolic subshift defined by the property that every block 
of length r(n) + n in Y appears in f n °°\x). Note that Y is a shift of finite type and 
is irreducible because X is topologically mixing. In particular, there is an integer D 
so that given two blocks a, c appearing in Y, there is a block for every k > D such 
that abkC appears in Y. We can also fix a block a* appearing in Y which contains 
a copy of every n-block in Y. Increasing D or lengthening a* if necessary, so may 
assume that a* is of length D. We may also assume without loss of generality that 
D > 1/e. 

We need the following, which is a specialized version of lemma 2 from [9]: 

Lemma 3.2. There exists 5 > and T e N such that for all T > T there is a 
family I of subsets of {0, . . . , T — 1} satisfying 

(1) |/| > 2 5T , 

(2) For A <E I and distinct u,v G A, we have \u — v\ > 

(3) For each A,B G / and k < ^, An (B + jfe) ^ 0. 
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We use the lemma in conjunction with the following simple fact: 

Lemma 3.3. Fix T and let A,BC {0, 1, . . . , T} satisfy the three conditions of the 
previous lemma. Fix < k < ^ — n, and let z', z" G [0, 1] N so that a* appears in z' 
at each index i G A and in z" at each index j G B + k. Then for every pair a, b of 
n-blocks from Y , there is an index u so that a appears in z' at u and b appears in z" 
at u. 

Let p, S, T be as in the preceding lemmas. Since (X, T, p) has zero topological 
entropy it follows that we can choose an integer H > ^T and large enough so that 
2$(ep/w)H - IS g rea t er than the number of (P„, H) -names in X. We fix such an integer 
H and construct a Kakutani skyscraper over some clopen set B C X of small enough 
measure so that all columns of the tower have height H or H + 1. The tower may 
be made to fill all of X because (X, T) is minimal. Purify the columns according 
to P n , and let B\. . . Bn be the bases of the purified columns so {Bi . . . , B^} is a 
clopen partition of B. Let h{i) denote the height of the column over Bi. Note that 
the P n -name of each column is in Y . 

Divide each column into ^ blocks of length (which we assume for convenience 
is an integer), and possibly an additional level in those columns which are of height 
H + 1. We proceed to modify P n as follows. 

• In each column, modify the bottom 1 + ^ blocks so that they are identical, 
and similarly for the top 1 + - p blocks; and do so in such a way that the name 
of the entire column is admissible for Y . This can be done because ^H, the 
length of each block, is much larger than D. Notice that by choice of p, the 
first and last n + 1 blocks in each column are identical. 

• To each block, except the top and bottom n blocks of each column, assign a 
distinct set A C {0, . . . , j$H— 1} such that \u — v\ > for distinct u, v G A, 
and if A, B are assigned to distinct blocks and • < k < yjj • then 
A fl (B + k) 7^ 0. We can do this by the choice of H and the lemma. To the 
bottom n blocks in each column assign the same set A which is assigned to 
the n+ 1-st block of that column, and similarly to the top n blocks assign the 
same set which is assigned to the n + 1-th block from the top. We have thus 
assigned sets to each block. 

• For a block b appearing in one of the columns and A the set associated to it, 
we modify b as follows. For convenience in this paragraph we renumber the 
coordinates of b from to — — 1, no matter where in the column b actually 
appears. For each % G A we replace the block of length D in b starting at i 
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with the block a*. Next, modify the symbols from i — D to i — 1 and from i + D 
to i + 2D — 1 in such a way that the entire block from i — 2D to i + 3D appears 
in Y; we can do this by the definition of D. All in all, we have changed b from 
index % — D to index i + 2D — 1. Because of the distance between successive 
elements of A, these changes for different % G A occur at different places in b 
and the changes do not interfere with each other. 

Note that the bottom n + 1 blocks of each column are still identical, as are 
the n + 1 top blocks. 

Denote by P n+ i the partition obtained so far, and by f n+ i the corresponding 
function. 

• If b±, 62, • • • , by p are the bottom - blocks of some column, replace b^ with 
(k—l)p-bk, where a ■ bi is the block obtained by multiplying each coordinate 
of bi by a. Similarly, if C\, C2, ■ ■ ■ , c\/ p are the top n blocks of a column replace 
c fc with (1/p - k)pc k . 

• For columns of height H + 1, replace the top symbol with 0. 

• Perturb the first symbol of each column by less than e in a way that the name 
of each column is unique. 



Let f n+ i be the functions defined by the revised partition; we claim that it has the 
desired properties for some integer r(n + 1). 

We first estimate the measure of points on which /„ and f n+ i differ. It suffices to 
show that in each column the fraction of levels modified is less than e. The change to 
the top and bottom - blocks amounts to - blocks out of — , which is f of the levels. 

^ p p ep ' 5 

Of the intermediate levels, since in the sets A associated to the blocks the distance 
between elements is at least and each element causes a change of 3D symbols 
to its block, here too we have caused a change to at most a f§-fraction of the levels. 
The change to the top symbol of columns of height H + 1 amounts to less than jj of 
the space. Thus we have indeed modified f n on a set of measure less than e. 

We now show that we can choose r{n + 1) so that f n+ i is (m, r(m))-good for each 
m < n + 1. Note that every block in fj^l(X) of length r(n) + n is of the form 
described in lemma 13.11 so for m < n the conclusion follows immediately from that 
lemma. 

We must show that f n+ \ is (n + l,r)-good for some r. Let x', x" G X. We must 
show that there is a k of bounded size such that 

and /^V) - /£t 1 V*' / ) < ^TT- Denote y> = f n+1 (x>) and y" = f n+l {x") 

00 

and also z' = f n+ \{x') and z" = f n+ \{x"). We distinguish several cases. 



< n+l 

00 
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Case 1. Both x', x" are in the top block or level H + 1 of their respective columns. 
Then the first — symbols of z', z" are 0, and the conclusion holds for k = 1. 

Case 2. Exactly one of the points, say x', is in the top block or level H + 
1 of its column, so z' consists of 0's. Note that in y" = f n+ i(x") the block a* 
appears somewhere between index 1 and — , hence there is a < A; < — with 

fn+i\x") ~ fn+f\ akx ") = °- If we replace fjffi with the left hand side 

oo 

changes by at most p and we get 



On the other hand, 



<p 

because the first — symbols of 

re J 



the itinerary of x' are 0; as desired. 

Case 3. x' x" are in different columns or the same column but at least \ ■ — 

' 9 ep 

levels apart, and neither is in the top block or top level. By looking at the blocks to 
which x',x" belong and to the next block, by lemma [331 we see that for every pair 
of n + 1-blocks, and in particular the one appearing at the start of the itineraries of 
x r , x", there is a A; in the range we want so that these blocks appear again in the f n+ i 
itinerary of both x' and x" at index k. As in case 2, this gives the conclusion for the 
f n+ i itinerary because again the change from f n+ i to f n+ i is "too slow" to affect the 
inequality very much. 

Case 4. x' x" belong to the same column and are within \ ■ — levels of each other. 

> o 9 ep 

If they are in one of the bottom - levels then we are done by the periodicity of these 
blocks (again, there is some slow "drift" which does not affect us). Otherwise, the 
initial n + 1-block of both itineraries belongs to Y. We claim that there is an M so 
that either for some < i < M the points T l x', T % x" belong to different columns but 
not to the top or bottom - blocks of those columns, or else there exists a k < M 
as desired. This suffices because in the former case we can argue as in case 3, and 
deduce that as k ranges over the 1, . . . , M + — , every pair of n + 1-blocks from Y 
appears at index k in the / n+ i-itineraries of x', x" . This gives the conclusion we want. 

It remains to show that there is such an M. This follows from the fact that 
f^l(X) is a minimal symbolic system. Indeed, suppose the contrary. Then for every 
M there exist points x' M) x" M 6 X so that whenever 1 < i < M and T l x' M ,T l x" M 
are in different columns it is because they are within ^ of the top or bottom of a 
column, and also the initial n + 1-blocks of the itineraries of x',x" do not appear 
again together before time M. We may assume that x' M — > x' and x" M — > x" . Now 
x',x" have these properties as well, for all M. Assuming as we may that x' is above 
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x" in the column they belong to, it follows that the itinerary of x' is a shift of the 



itinerary of x" , so the pair \x'), f^X \x")) G f^_ 1 (X) is of the form (y,T r y) 



for some r since f^{(X) is minimal this point must be recurrent, a 

contradiction. This completes the proof of theorem ll.il 

3.3. Partitions derived from continuous functions and predictable Z d ac- 
tions. In this section we prove a purely measure-theoretic and topological lemma 
which involves no dynamics. Let X be a normal topological space and \i a regu- 
lar probability measure on the Borel cx-algebra of X. The entropy and conditional 
entropy of finite and countable partitions is defined as usual. For finite or count- 
able measurable partitions V = (Pi, P2, ■ • •) and Q = (Qi, Q2, ■ ■ ■) of X with finite 
entropy, the Rokhlin metric is defined by 



d(V, Q)=H(V\Q) + H(Q\V) 

This metric has the property that if V = (Pi, P 2) . . .) and we define V {n) = (P 1 ,..., P n , U^ =n+1 P k ), 
then -> V in d. 



We say that a partition V is continuous if there is continuous function / G C(X) 
which is constant almost surely on each atom of Pj. 

Proposition 3.4. The continuous partitions are dense with respect to the Rokhlin 
metric in the space of finite- entropy countable partitions. 

Proof. The proof is a variation on Urisohn's lemma which states that given two closed 
disjoint sets Cq, C\ in a normal space, there is a continuous function < / < 1 such 



Let D C Qfl [0,1] be the dyadic rationals. Let V = (Po, Pi) be a partition into 
two sets and let £ > 0. We construct a continuous function / : X — > [0, 1] with 
yu(U re D)/ _1 (^)) = 1 such that for Q the countable partition Q = {/ _1 (r) : r G D} 
we have d(V, Q) < e. The proof in case V has more than two atoms is similar; this 
is sufficient, because the finite partitions are dense in the Rokhlin metric. 

We construct a family of open sets {U r } re o with U r C U s for r < s and with 
fi(dU r ) = 0. We will also define closed sets (C r ) such that C s C U t \ U r for all 
r < s < t, and /i(UC r ) = 1. We will then define / by 



This defines a continuous function with f\c r = r, and so {/ 1 (x) : x G [0, 1]} equals 
{C r } U P t° measure 0. 



that f-^O) = C and / 




/(*) 



inf ({1} U {r : x G U r }) 
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Fix a sequence (ek) to be determined later. For i — 0, 1 let C, C Pj be closed sets 
with null boundary and measure //(Cj) > (1— £i)/i(Pj). Set ?7o = and Z7i = [0, l]\Ci 

Let D fe C D be the set of reduced dyadic rationals with denominator 2 k . We 
proceed by induction on k, defining at each step the sets U r ,C r for r £ U>k under 
the assumption that they have been defined already for r £ Uj<fcBj. Write = 
U^fcDj = {r 1; . . . , r n } with r x < . . . , r n and let r £ D fe . Then there are r', r" £ E fc 
with r' < r < r" and (r', r") n E k = 0. Let V" = U r n \ U r > and choose C r QV with 
fiiCr) > (1 — Sk)^iy) = (1 — Sk)n{U r ii \ U r >). Choose U r so that it contains C r U U r f, 
it has fj,(dU r ) = and U r C f/ r ». 

Write Q = {C r } reD . Set C fe = Uj> fc U reDi C r and let Q k = {C r } reEk U {C fc } be 
the partition obtained by merging all the atoms C r in Q with r £ ^j>kDj. Let 
Cfc = U re D fc C r . The sequence {ek) controls the convergence of the sequence (/x(C£)) 
and the latter can be made to converge arbitrarily quickly. In particular we can 
guarantee that Q has finite entropy. Now Qk — > Q in the Rokhlin metric, so 

d(P,Q) = \imd(V,Q k ) 

fc— >oo 

fc-1 

< lim (d(V,Q 1 )+Y / d(Q i ,Q i+1 )) 
1=1 

oo 

= d(v,Qi) + J2 d (Qi>Qi+i) 

i=l 

and the last line can be made arbitrarily small by prudent choice of (Ek), since Q4+1 
refines Q4 by splitting CI into at most 2 h atoms whose relative mass is determined 
by £k- □ 

We can now prove theorem 11.31 Note that even for d — 1 the proof is more direct 
than that given in [5]. 

Proof, (of theorem II .3h . Let Z d act on X and suppose that for every / £ C(X) one 
has 

/£ (l,T u f :u<0) 

where < is the lexicographical order on Z d . This implies that / is measurable with 
respect to the a-algebra generated by {T u f : u < 0}, and in particular this shows 
that for any T-invariant measure fi there is a dense (in the Rokhlin metric) set of 
partitions Q for which h(Q) = 0, namely those which come from continuous functions 
(proposition I3.4p . Since h(jj } V) is continuous in V under the Rokhlin metric we 
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conclude that h(/i, V) = for every two-set partition and hence h(/i) = 0. By the 
variational principle, h top (T) = 0. □ 

4. Prediction in symbolic systems 

4.1. Generalities about subshifts and prediction. Let £ be a finite alphabet, 
a : S z — > S z the shift transformation. For ir G S z set x _ = (. . . , x_2, and 
for a subshift X C S z let X - = {a; - : rr G X}. A finite or right-infinite word a is 
an extension of x~ G X~ if x~a appears in X. Let L(X) be the set of finite words 
appearing in X and L rn (X) = L(X) n S m . 
The following fact is well-known: 

Lemma 4.1. ^4 subshift X is the union of periodic orbits if and only if every x~ G X~ 
extends uniquely to x G X . 

Proof. If X is a finite union of periodic orbits the conclusion is clear. 

For the converse we rely on the fact that, if there is some n such that x_ n , . . . , x-i 
determines rco for all x G X, then X is the finite union of periodic orbits. Suppose then 
that X C S z is not the union of periodic orbits, then for every n there is a word a n G 
L n (X) and distinct symbols u n , d„6E such that a n w n , a„f„ G L n+1 (X), so that there 
are words b n ,c n G S N+ beginning with u n ,v n respectively such that a n b n ,a n c n G X. 
By compactness we can choose a subsequence n{k) such that u = u n ^) an d v = v n ^) 
are constant, a n ^) —> x ~ £ X~, 6 n (fe) — > 6 G S N+ and c n (fc) — > c G S N+ . But then 
a, b begin with the distinct symbols u,v and x~a,x~b G X, so rc~ has at least two 
extensions in X. □ 

Thus every nontrivial subshift, including zero-entropy ones, has at least one past 
with multiple extensions. On the other hand, the following observation was pointed 
out to us by B. Weiss. Note that it is is a special case of the general fact that minimal 
systems are invertible on a dense Gs- 

Lemma 4.2. If X is a minimal subshift then for every a G L(X) and k G N there is 
a word b G L(X) such that ba G L(X), and every occurrence of ba in X is followed 
by a unique word c G 

Proof. It suffices to show this for k = 1, as the general case then follows by induction. 
Let a G L(X) and u G S such that au G L(X). Consider all 6's such that bu G L(X) 
and au appears in bu exactly twice, as a front segment and a back segment. By 
minimality the lengths of such 6's is bounded above and we can choose a maximal 
such b. If x + G X + and bx + G X + , then by minimality au appears in x + ; thus by 
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maximality of b we must have x + (l) = u, for otherwise there is a front segment c of 
x + such that au appears in be only as a front and back segment, which is impossible 
by maximality of b. Thus b is always followed by u in X. □ 

Corollary 4.3. If X is a minimal subshift and u G L(X) then there is a word 
v G L(X) so that every occurrence of v is followed by u. 

Proof. Let u be given, let k be large enough that every c G L k (X) contains u. In the 
previous lemma and taking a the empty word, and let b, c be the words obtained. 
Then b is always follows by c and c = due" for some u', u" . The word v = be' has the 
desired property. □ 

4.2. Realization theorem. We now begin the proof of theorem ll .51 We begin with 
a measure preserving system (X, B, /z, T) of entropy zero, and wish to construct a 
strictly ergodic subshift supporting an isomorphic measure, and in which each past 
has at most two futures. By [6j, we may assume that \i is an invariant measure on a 
uniquely ergodic minimal subshift X C {0, 1} Z , and that X is topologically mixing. 
We may assume \i is aperiodic; otherwise the statement is trivial. 

We construct a sequence of two-set generating clopen partitions V n for n — 0, 1, 2, . . . 
such that V n — > P*, where V* generates for \i. Denote by X n the symbolic system 
arising from X and V n . Note that since V n is clopen, X n is minimal and uniquely 
ergodic. The two-sided P n -name of a point x G X is a point in X n . 

We will define a sequence of integers m{n) > n so that L m ( n )(X n ) = L m ( n )(X n+ i), 
and another sequence k{n) > n with the property that for every u G T, k ( n \ 

#{w G E n : uw G L(X n )} < 2 

these numbers will satisfy m{n) > k{n) + n, so that the system X* arising from V* 
will have the property that for every u G £ fc ( n ), 

#{w G S n : wu> G LpQ} < 2 

This implies the desired result. By choosing the m{n) large enough at each stage, we 
can furthermore guarantee that X* is minimal and uniquely ergodic, but we do not 
go into the details of this. 

The construction is by induction. Define to be the clopen generating partition 
according to the 0-th symbol, set m(0) = and k(0) = 0. 

We describe now the inductive step of the construction. We are given a two-set 
generating partition V n of X into clopen sets and an integer m{n). Given e > 
we will construct a new partition V n +i which is £-close to V n . We will ensure that 
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L m ( n )(X n ) = L m ( n )(X„ + i) and define an integer k(n + 1) as above. Finally we will 
be free to choose m(n + 1) arbitrarily, since it only affects the next step of the 
construction. 

Let Y n be the shift of finite type whose allowed blocks of length m(n) + 1 are those 
appearing in L m ( n ) + i(X n ) . Since X n is infinite and X n C Y n it follows from basic 
properties of shifts of finite type that Y n has positive entropy. Using the fact that 
X n is mixing and has zero entropy (whereas Y n has positive entropy) we can find a 
word a G L m ( ra ) +1 (X„), a word b \d G L(X n ) and a word 6 new G L(Y n ) \ L(X n ) such 
that &oid? knew have the same length and both begin and end with the word a. 

The partition V n+ i will be constructed by replacing some of the occurrences of 6 id 
in X n with b new . This is done as follows. First, using the lemma, choose c G L(X n ) 
so that every time b Q ^ appears in X n it is followed by c. We can extend c backwards 
arbitrarily while preserving this property, so we may assume that c is arbitrarily long. 
Since X n is minimal, there is an R such that the gap between occurrences of c in X n 
is at most R. 

Next, choose a large N (how large will depend on R, ^(& i(j) and on the growth of 
words in the system X n , and will be explained below) and choose a clopen bounded 
Rokhlin-Kakutani tower in X n all of whose columns are of height N — 1 or N. Purify 
each column of the tower according to the clopen partition Vf^T^Vn. Consider 
one such column, which corresponds to the P n -name w. We proceed to modify the 
P n -name of the column; doing this for each column defines a new partition V n +\. 

Let denote the location of the first occurrence of c6 id in w, let i(2) be the 
index of the next occurrence which does not intersect the first occurrence, and so on 
till i(r). Replace the occurrences of cb \d at indices i(2), i(3) and i(r — l),i(r) 
with cb Qew . 

Using the syndeticity of occurrences of c, for some a > we have r > aN, 
where a depends on R. We next encode the P n -name of the atom W^T^Vn to 
which the base of the tower belongs by replacing the word c6 id at some of the levels 
i(5), i(7), . . . , i(r — 3) with cb new . We will use only locations where j is odd; 
thus the consecutive occurrences of cb Qew at the top and bottom of the column are 
unique and serve to identify it. Note that if several c6 new 's appear consecutively in 
the V n+ \ name of a point, then they are in groups of two or three or five, where the 
last possibility arises when the top marker of a column is followed immediately by a 
bottom marker of the following column. 
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The reason we can encode the V^ T~*P n -names of the column in the approximately 
\olN bits available is that by zero entropy of X n , the number of V^T _l 'P n -names is 
< 2 aAr / 4 assuming N is large enough. 

We have defined a partitions V n+ i. Note that we have modified w along a set 
of density at most £(b \d) / £(cb \d) , which can be made arbitrarily small by making c 
long; thus V n+ \ can be made £-close to V n . 

Since b Qew does not appear in L(X n ), we can recover the V n name of a point x G X 
simply by replacing every occurrence of cb Qew with cb \&. Thus, since V n generates, 
so does V n+ i. 

Because 6 id? b Qew agree on their first and last m(n) symbols, and because b new G Y n 
and all m(n)-blocks in Y n are in L n (X n ), we also have L m ^(X m ) C L m ^(X n+ i). 

Consider a point x G X . We will show that by looking 2N symbols into the past 
of the "P n+ i-name of x, we can determine that the P ra+ i-name of x from time 1 to 
£(b new ) takes on one of at most two possible values. Thus setting k(n) = 2N and 
noting that £(b new ) > m(n) > n we will have completed the inductive step. 

Look into the P n+ i-past of x until we find a sequence of either exactly two or exactly 
five consecutive occurrences of cb new ; this must happen after at most N symbols at 
some index i. Looking back at most N symbols more we find the next group of two 
or five consecutive c6 new 's at some index j. Between j and % we have coded the V n 
name of x from times j to time j + 3N (and even a little bit more). In any case, 
assuming as we may that N > £(b new ), and since j > — 2iV, we can certainly recover 
the V n name of x from time j to time ^(& n ew)- 

We now claim that there are at most two choices for the P n+ i-name of a; from time 
1 to m(n + 1). Note that the P„-name of x and the P n+ i-name of x differ only at 
points which lie in the £(b new ) symbols following certain occurrences of c. But if some 
such occurrence of c intersects the P„-name of x from times — £(b new ) + 1 to £(b new ), 
then from space considerations there is a unique such c; and in this case the next 
£(b new ) symbols are either b new or 6 okl . Thus there are at most two possible choices 
for the atom of V^f™- )+1 T s 'P n+ i to which x belongs. 

This completes the discussion of the induction step. By choosing e small enough 
at each stage we can arrange that V n — > V* with V* a generating partition for /x, and 
X* will be 2-branching. By a propper choice of m(n) and using the unique ergodicity 
and minimality of X (and hence of all the X n ), we can also ensure that X* is minimal 
and uniquely ergodic. 
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5. An extremely non-invertible zero-entropy system 

5.1. Generalities. In this section we address the relation between entropy and the 
structure of preimage sets of points in non-invertible topological systems. The moti- 
vation for this is the following simple fact, whose proof is a good illustration of why 
one expects there to be a connection between entropy and large preimage sets: 

Proposition 5.1. A system with no small preimages has entropy at least log 2. 

Proof. Let (X, T) be a system and 5 > so that for every x G X there are x',x" G 
T^ x {x) with d(x', x") > 5. We can define functions r , T\ : X — > X so that tq(x), tx(x) G 
T~ 1 (x) and cI(t q (x), Ti(x)) > 5; note that t ,ti need 7 not be continuous. For nGN 
and a sequence a = a n a n _i . . . a\ G {0, l} n let 

T a (x) = T an (T an _ 1 (...T ai (x)...)) 

Note that T(r a (x)) = Tb(x) where b G {0, l} n_1 is obtained by deleting the first 
symbol of a. 

For a fixed x G X consider the set 

A n (x) = {r a (x) : a G {0, 1}"} 

If a, b G {0, 1}™ and a ^ b then there is a maximal index i < n such that dj = bj for 
1 < j < i but a i+ i ^ b i+1 . Let y = T aiai _ u .. ai (x) = T b . bi _ 1 ... bl (x); then 

T n ^-\r a {x)) = r at+1 (y) 
T^-\r b {x)) = r bi+1 {y) 

so d(T n - i+1 r a (x)), T n - i+1 T b (x)) > 5. It follows that all the points in A n (x) are distinct 
and the set A n (x) is (n, 5)-separated; since this is true for all n, this implies that 
h(X,T) > log 2. □ 

One easy consequence of this is that for finite alphabets S every extremely non- 
invertible subshift of S z has entropy at least 2, because once a metric is fixed there 
is a 5 such that every two distinct preimages of a point are 6 apart. 

As was mentioned in the introduction, J. Bobok has shown that for maps of the 
interval if a map is fc-to-one then it has entropy > log A; [TJ. 

It is not hard to construct examples of zero entropy systems where every point has 
multiple preimages, but it is not so easy to construct such a system with a globally 
supported ergodic measure, and Eli Glasner has asked whether this is possible. The 
construction below gives an affirmative answer to this question. 
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5.2. The construction. Let a be the shift on the one-sided Bebutov system [0, 1] N . 
We will construct a subshift of the Bebutov system by specifying a point G [0, 1] N 
and taking its orbit closure X = {a n x*} ne ^. Things will be engineered so that X 
has zero topological entropy, and x* is generic for an ergodic measure \x on X having 
support X. 

For words x,y G [0, 1] N we set 

oo 

d(x,y) = X;k(0-I/(0|-2- i 

i=i 

this defines a metric on [0, 1] N which is compatible with the compact product topology. 
We also write 

||x|| = d(x, 0) 
where = (0, 0, . . .). For a finite word x we define 

l{x) 

\\x\\ = \x(i)\ ■ 2~ l = inf {||y|| : y G [0, 1] N and a; is a front segment of 
i=\ 

Note that ||a6|| > ||a|| and that if x n are finite words and x n — > x G [0, 1] N in the 
obvious sense then ||x n || — > \\x\\. 

Suppose x G [0, 1]* is a finite word. We define 9o(x), 9i(x) G [0, 1] by 

0o(x) = I \\x\\ , 9i(x) = \ \\x\\ 

o 4 
and we define r , n : [0, 1]* — > [0, 1]* by 

To (a) = 6» , 7i(z) = 6>i (rr)rr 

i.e. the symbols 0i(x) are appended to the beginning of x. 

For a sequence 6 = &m&m-i • • • &i G {0, 1} M define r& inductively by 

7W..6i(aO = ^(TftM-i.-bi^)) 
and set T$(x) = x. Note that if b — bM ■ ■ ■ &i then 

and in particular a M (rb(x)) = x. One verifies that ||t6(x)|| — > exponentially as the 
length of b tends to oo, uniformly in b and x. 

We define r& on [0, 1] N by the same formula. In the subshift we are about to 
construct the preimage set of a point x will contain at least To(x), ti(x). Since n(x) — > 
as £(b) — > oo the preimage tree of each point will be "narrow", and not contribute 
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to the entropy. Note however that there will also be preimages which do not come 
from applications of r b . 

We construct x* in recursively. At the n-th stage we will be given a finite word x n 
of length L n and construct a word x n+ i of length L n+1 such that x n+ i = x n x' n for 
some word x' n . We then take x* to be the limit of this increasing sequence of finite 
words. 

We begin with an arbitrary finite word Xo of length Lq > 0. Our only assumption 
about xq is that it is strictly positive. 

The passage from stage n to n + 1 is as follows. Given x n of length L n , for 
< k < L n let Wk be the back segment of x n starting at index k, that is, 

w k = x n (k)x n (k + 1) . . . x n (L n ) 

so £(w k ) = L n - k + 1. For b G {0, l} 3 ^" set 

w b ,k = n(w k ) 

Define y n to be some concatenation of the words as b varies over {0,l} d " and 
< k < L n (the order is not important). 

Now choose a large integer M n which we will specify later. For now we note that 
M n may be chosen to depend not only on all the previous stages but also on y n . 
Define 

Xji+I iS^n^n ■ ■ ■ Vn 

M n times 

Set = limx n and let X be the orbit closure of x*. In the next few subsections 
we will show that (X, a) has the advertised properties. 

■5.3. (X, a) is extremely non-invertible. The point x* has been constructed in 
such a way that if some finite word a appears in x* then it appears in at least two 
different configurations, preceded by symbols r,r' G [0, 1] such that |r — r'\ > ^ ||a||. 
This is because if a is a subword of x n then a is a front segment of some back segment 
b of x n , and so To {b) and yiib) appear in x n+ i, and by definition the first symbol of 
r (6) and n(6) differ by ^ ||6||, and > ||a||. 

Thus if y is a limit point of and y ^ 0, then y is a limit point of finite subwords 
a n of x*, and since ||y|| > c > for some c we have that ||a n || > c for all large 
enough n. Therefore we can find symbols r' n ,r" G [0,1] such that \r' n — r"\ > j^c 
and r' n a^,r"a n appear in x*. Passing to a subsequence we get that r' n a n — > r'y and 
r"a n — > r"y for some r',r" G [0, 1] with \r' — r"\ > j^c, and so r'y,r"y are distinct 
preimages of y in X. 
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It remains to check that has two preimages (it is clear from the construction that 
G X , since x* has arbitrarily long sequences of small numbers, consisting of front 
segments of the w^k)- Since is a fixed point of a, one preimage is itself. To see 
that there are other preimages, note that the words x n all end in the same positive 
letter e, the last letter of x , and this is also the last letter of all the words w^k we 
constructed at each stage. On the other hand as £(b) — > oo the front segments of w^k 
approach 0, so there are arbitrarily long sequences of arbitrarily small numbers in re*, 
each sequence preceded by an occurrence of e. Thus eOOO ... is also a preimage of 
in X. 

5.4. (X, a) has zero topological entropy. We verify this by estimating the number 
of e-separated orbits. For words a, a' (either finite or infinite) we write 

\\a - a'W^ = sup \a(i) - a'(i)\ 

i 

Note that for x, x' G X, 

\\x\[i- n ] ~ x'\ [1 . n] \\ oo > e =>• max{d(TV, TV) : i = 1, . . . ,n} > e 
Fix e > 0, and let A n be the set of all subwords of x* of length n. Set 
C e (n) = m&x{\A\ : A C A n , Va, a' G A \\a - a'W^ > e} 
The topological entropy of (X, S) is 

lim lim sup — log C £ (n) 

For a finite or infinite word a with symbols in [0, 1], let [a} £ denote the word b of 
the same length such that 

b(i) = [a(i)/e] ■ e 

(here [r] denoted the integer part of r). Thus the coordinates of [a} £ belong to the finite 
set {0, £, 2e, . . ., Note that if \\a — a'W^ > e then ||[a] e / 2 — [a'j^H^ > e/2. It 

is therefore sufficient to prove the following: 

Claim 5.2. For every e > 0, the number of length n subwords of [x*] e /2 which are at 
least e/2 apart in H-H^ grows sub-exponentially with n. 

We will use the following property of x*: 

Lemma 5.3. For every n we can write x* = a^a^ . . . , where each Oj is of length 
at least 3 Ln and for each i, either 

(1) Oj = x n , or 
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(2) For each 1 < j < £(ai) — L n we have cii(j) < |ai(j + 1). 

In particular, for any e > 0, for n large enough each ai is either equal to x n or else 
all the coordinates of at, except the last 2L n coordinates, are of magnitude < e. 

The proof of the lemma is an elementary induction from the definitions, and is 
omitted. 

Proof, (of claim 11x21 Fix e > and let z* = [x*] e /2 and z m = [x m ] e /2. From the lemma 
, we see that for the given e for large enough m we can write 

-2* = ^1^3 • • • 

and for each % the word is either equal to z m , or else i(vi) > 3 Lm and at least 
a (1 — 2~ Lm )-fraction of the coordinates of Vi are 0. In view of this, the fact that 
the number of subwords of z* of length n grows sub-exponentially is now a standard 
counting argument, and the claim follows. This shows that h top (X, a) = 0. □ 



5.5. is generic for a globally-supported measure p on I. A point y in a 
dynamical system (Y, S) is a generic point for a measure /z if for every continuous 
function / G C(Y) it holds that limjv^oo ^2^=1 f{S % v) exists- When this is true 
then J2n=i converges in the weak-* topology to an invariant measure fx on Y 
(here 5 X is the point mass at x). One condition that guarantees that y is generic is 
that for every open set U C Y the averages limjv^oo Yln=\ ^u{S l y) exist; in fact it 
is sufficient to verify this for U coming from a basis for the topology of X. 
For U C [0, If, let 

[U] = Ux [0, l] 1 ^ 1 --*} C [0, 1] N 

be the cylinder determined by U . Sets of this form for open U constitute a basis for 
the topology of [0, 1] N . We will show that for every such U, the series 

j m 

(5.1) p(m) = —J^Uui^x*) 

converges. This implies that the weak* limit measure 

1 - 

H= lim - V So**. 

n— >oo TL ' 
i=l 

exists, and is a shift-invariant measure on X. In fact, we will show that fJ>(U) > if 
and only if p(n) > for some n. From this it will follow that /i has global support in 
X. 
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For a finite word a we will say that a G [U] if ab G [U] for every infinite b G [0, 1] N . 
Thus if a G [U] then ab G [U] for every finite b. The property a G [£/] depends only 
on the first k coordinates of a (recall that U C [0, l] fc ). Note that if £(a) < k it is 
possible that a ^ [U] but that a& G [U] for some (finite of infinite) b. 

Claim 5.4. Let U C [0, l] fc and p(n) as above. The limit lim s _ ) . 0O p(L s ) exists; further- 
more, if p(n) > for some n then the limit is positive. 

Proof. If (j n Xi f [U] for every n then clearly limp(n) = 0. Therefore we must check 
only the case when o n x if G [U] for some n. Note that in this case, p(m) > for all 
m > n. We prove first that p(L r ) converges at r — » oo, and then the general claim. 

For a word a, let 1(a) be the number of indices < n < £(a) such that a n a G [U]. 
If we let a m be the front m-segment of x#, we have 

< p(m) < 

m m 

(the right inequality is because of edge effects; it is possible for a n a ^ [U] but o n x* G 

[[/] if 1(a) — k < n < £(a)). In particular, for any r we have 

(5.2) ^ < p(L,) < I{Xr) T + ^ 

If p(L r ) > then also p(L r+ i) > 0, and x r+ i contains at least M r copies of x r . 
Thus if we assume that M s > 2 s for every s, we may fix r such that I(x s ) > 2 s for 
every s > r. 

For an s as above, write 

x s -\-\ x s x s . . . x s y s 

as in the construction of x s +i, with the x s 's repeating M s times. We can write 
I(x s+ i) — h + I 2 , where 

h = # {0 < n < M S L S : a n x s+1 G [U]} 
h = #{M S L S < n < L s+1 : a n x s+1 G [U]} 

We have 

M s ■ I(x s ) <h<M s - (I(x s ) + k) 

since we may gain at most M s k occurrences at the edges of the x s 's but we can't lose 
occurrences. Also we have the trivial bound I 2 < £(y s )- Therefore 

MJ(x s ) < I(x s+1 ) < M s (I(x s ) + k) + % s ) 
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and substituting this and L s+ i = M S L S + £(y s ) into inequality 15.21 we get 

M s ■ I(x 8 ) < M s -I(x s )+£(y s ) + (M s + l)k 

M S L S + £(y 8 ) ~ P[ s+l) ~ M S L S + £{y s ) 

dividing the middle term by p(L s ) and using (15.21) again, we get 

1 1 < p(L s+1 ) < 1 + k/I{x a ) + (£(y) + k)/MJ(x s ) 

1 + k/I(x s ) ' 1 + £(y)/M s L s ~ p{L s ) ~ 1 + £(y)/M s L s 

We saw above that k/I{x s ) is exponentially small in s. Thus if {M n } grows quickly 
enough, both the expression on the left, which we denote a s , and the expression 
on the right, which we denote /3 S , converge to 1 rapidly enough for their product 
to converge to a finite positive number. Now the relation a s < ~to^ < Ps an d 
the fact that < YIT a s, YIT Ps < oo implies p(L s ) converges to a positive limit as 
s — > oo. □ 

Claim 5.5. For {7 and p(n) as above, lim n ^ oc p(n) exists and is positive if p(n) > 
for some n. 

Proof. Let p = limp(L s ), the limit of p{n) along the subsequence L s . To show that 
p(n) — > p, we show that if L s < n < L s+1 then p(n)/p(L s _i) is close to 1, in a manner 
depending on s and tending to 1 with s. To see this, recall that 

%s+i {x s x s . . . Xg}y s 

= ((x a -i . . . x g -iy s -i) . . . (x s -! . . . x s -iy a -i)) y s 

Write a n for the front n-segment of x s+ \. Then there is a unique way to write a n as 

a n = (x s . . . x s )(x s -i . . . x s -i)w 

with w a front segment of either x s _i, y s -\ or y s . 

For n > L s the number of x s 's appearing is at least 1. Now consider two alterna- 
tives: If w is a front segment of either x s _i or y s _i then £{w) is negligible compared 
to £(a n ) because £(a n ) > £{x s ) > M s _i£(x s _i) and M s _i has been chosen large. On 
the other hand if w = y s then all M s repetitions of x s appear in a n , and again we 
have that £{w) is negligible compared to £(a n ). 

An estimate like the one carried out for p(L s ) shows that we can ignore edge effects 
and write p{n) as some weighted average of p(L s ) and p(L s _i). But we know already 
that p(L s )/p(L s _ 1 ) -> 1, so p(n) w p(L 8 -i) ->• p. □ 



5.6. The only ergodic measures on X are \i and the point mass <%. A-priori 
the measure p, for which x* is generic need not be ergodic. Rather than prove directly 



ON NOTIONS OF DETERMINISM IN TOPOLOGICAL DYNAMICS 



25 



that ji is ergodic, we will show that if v is any ergodic measure on (X, a) then v is a 
convex combination of \i and Sq. This implies that [i is an extreme point of the convex 
set of invariant measures on X, so it is ergodic and is the only ergodic measure on X 
other then (%. 

Theorem 5.6. The only ergodic measures for (X, a) are /i and 5$. 
Proof. Using lemma [531 we can select a sequence r(n) — > oo and write 



and all but the final 2L r(n ) coordinates are < 1/n. 

If v is an ergodic measure for (X, a) then for some sequence with m(n) — k(n) — ► oo 



(this follows from the fact that by the ergodic theorem v has generic points, and 
these can be approximated arbitrarily well by shifts o % (x*) of x*). By passing to 
sub-sequences we can assume that m{n) — k{n) > 2 Lr < n '; denote w n = ic*|[fc(n),m(n)] so 
that £(w n ) > 2 r W. Write A n for the total number of indices i — 1, . . . ,i(w n ) such 
that i is in a word b^ n with bj jTl = x r i n y We may further assume, by passing to a 
subsequence, that A n — » A G [0, 1]. 

Now we can write w n = b'bi( n ),n ■ ■ ■ bj( n ),nb" for some i(n) < j(n) and b', b" as short 
as possible. Notice that if 6j( n )-i,n or fr/( n )+i,n are x r ( n ) then their lengths, respectively, 
are negligible (logarithmic) compared to £(w n ), and so also are the lengths of b',b", 
respectively. On the other hand, if &i( n )_i,n is not x r ( n ) and if the length of b' is more 
than -£(w n ), then that word is made up almost entirely of coordinates of magnitude 
less than 1/n. Similar reasoning holds for b" . It is now simple to verify the following: 

• If A = then for large n most of w n is made up of coordinates of magnitude 
< 1/n, so in this case we have v = 8 . 

• If A = 1, then for large n, the distribution of words of length ■sj L r ( n ) in w n is 
very close to their distribution in x r ( n ), and since r(n) — > oo we have v = a 
in this case. 

• Finally for < A < 1 the same reasoning as above shows that 



— &l,n&2,n&3,n • • • 

such that each b iyU is either equal to x r ( n ), or has the property that £(6j, n ) > 3 Lr < n ' 



we have 




m(n) 



i=k(n) 



v = X/i + (1 - \)6 
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(note that because the lengths of the b^ n tend to infinity with n, the statistics 
of subwords of w n of length yj L r ( n ) are only very slightly affected by the places 
where two &i,„'s meet. Since we assumed that v is ergodic, this is impossible. 

Thus v — 8 or v — /j,. Since \x ^ 5 this implies that \x is ergodic. This completes 
the proof. □ 

5.7. Further comments. This example is optimal in the following sense. Any min- 
imal system (X, T) has the property that on some dense G$ subset of X the preimage 
of any point is a single point. Thus there are no minimal extremely non-invertible 
systems. Thus if we want an extremely non-invertible system supporting a global 
ergodic measure we cannot hope for a uniquely ergodic example. The example we 
have given is the next best thing: it has only two invariant measures and a unique 
minimal subsystem, the fixed point 0. 

The construction can be modified in several ways. For distance one can guarantee 
that the preimage set of every point is large: by augmenting the two functions 0o, 6\ 
at each stage of the construction with other functions it is not hard to make the 
preimage set of every point of cardinality 2 K °. By modifying 9q, Q\ in a more complex 
way one can replace the minimal subsystem {0} with other systems. 

Finally, in the construction we defined words Wb t k — T b(wk) where b varies over all 
0, 1-valued sequences of a fixed length. By varying this length in a "random" way the 
measure \i can be made to be weakly mixing, and perhaps even strongly mixing. 
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